128 research outputs found
V-SVA: an R Shiny application for detecting and annotating hidden sources of variation in single-cell RNA-seq data.
SUMMARY: Single-cell RNA-sequencing (scRNA-seq) technology enables studying gene expression programs from individual cells. However, these data are subject to diverse sources of variation, including \u27unwanted\u27 variation that needs to be removed in downstream analyses (e.g. batch effects) and \u27wanted\u27 or biological sources of variation (e.g. variation associated with a cell type) that needs to be precisely described. Surrogate variable analysis (SVA)-based algorithms, are commonly used for batch correction and more recently for studying \u27wanted\u27 variation in scRNA-seq data. However, interpreting whether these variables are biologically meaningful or stemming from technical reasons remains a challenge. To facilitate the interpretation of surrogate variables detected by algorithms including IA-SVA, SVA or ZINB-WaVE, we developed an R Shiny application [Visual Surrogate Variable Analysis (V-SVA)] that provides a web-browser interface for the identification and annotation of hidden sources of variation in scRNA-seq data. This interactive framework includes tools for discovery of genes associated with detected sources of variation, gene annotation using publicly available databases and gene sets, and data visualization using dimension reduction methods.
AVAILABILITY AND IMPLEMENTATION: The V-SVA Shiny application is publicly hosted at https://vsva.jax.org/ and the source code is freely available at https://github.com/nlawlor/V-SVA.
CONTACT: [email protected] or [email protected].
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online
Detection of correlated hidden factors from single cell transcriptomes using Iteratively Adjusted-SVA (IA-SVA).
Single cell RNA-sequencing (scRNA-seq) precisely characterizes gene expression levels and dissects variation in expression associated with the state (technical or biological) and the type of the cell, which is averaged out in bulk measurements. Multiple and correlated sources contribute to gene expression variation in single cells, which makes their estimation difficult with the existing methods developed for batch correction (e.g., surrogate variable analysis (SVA)) that estimate orthogonal transformations of these sources. We developed iteratively adjusted surrogate variable analysis (IA-SVA) that can estimate hidden factors even when they are correlated with other sources of variation by identifying a set of genes associated with each hidden factor in an iterative manner. Analysis of scRNA-seq data from human cells showed that IA-SVA could accurately capture hidden variation arising from technical (e.g., stacked doublet cells) or biological sources (e.g., cell type or cell-cycle stage). Furthermore, IA-SVA delivers a set of genes associated with the detected hidden source to be used in downstream data analyses. As a proof of concept, IA-SVA recapitulated known marker genes for islet cell subsets (e.g., alpha, beta), which improved the grouping of subsets into distinct clusters. Taken together, IA-SVA is an effective and novel method to dissect multiple and correlated sources of variation in scRNA-seq data
BiFET: sequencing Bias-free transcription factor Footprint Enrichment Test.
Transcription factor (TF) footprinting uncovers putative protein-DNA binding via combined analyses of chromatin accessibility patterns and their underlying TF sequence motifs. TF footprints are frequently used to identify TFs that regulate activities of cell/condition-specific genomic regions (target loci) in comparison to control regions (background loci) using standard enrichment tests. However, there is a strong association between the chromatin accessibility level and the GC content of a locus and the number and types of TF footprints that can be detected at this site. Traditional enrichment tests (e.g. hypergeometric) do not account for this bias and inflate false positive associations. Therefore, we developed a novel post-processing method, Bias-free Footprint Enrichment Test (BiFET), that corrects for the biases arising from the differences in chromatin accessibility levels and GC contents between target and background loci in footprint enrichment analyses. We applied BiFET on TF footprint calls obtained from EndoC-βH1 ATAC-seq samples using three different algorithms (CENTIPEDE, HINT-BC and PIQ) and showed BiFET\u27s ability to increase power and reduce false positive rate when compared to hypergeometric test. Furthermore, we used BiFET to study TF footprints from human PBMC and pancreatic islet ATAC-seq samples to show its utility to identify putative TFs associated with cell-type-specific loci
Recommended from our members
Conceptualizing Cancer Drugs as Classifiers
Cancer and healthy cells have distinct distributions of molecular properties and thus respond differently to drugs. Cancer drugs ideally kill cancer cells while limiting harm to healthy cells. However, the inherent variance among cells in both cancer and healthy cell populations increases the difficulty of selective drug action. Here we formalize a classification framework based on the idea that an ideal cancer drug should maximally discriminate between cancer and healthy cells. More specifically, this discrimination should be performed on the basis of measurable cell markers. We divide the problem into three parts which we explore with examples. First, molecular markers should discriminate cancer cells from healthy cells at the single-cell level. Second, the effects of drugs should be statistically predicted by these molecular markers. Third, drugs should be optimized for classification performance. We find that expression levels of a handful of genes suffice to discriminate well between individual cells in cancer and healthy tissue. We also find that gene expression predicts the efficacy of some cancer drugs, suggesting that these cancer drugs act as suboptimal classifiers using gene profiles. Finally, we formulate a framework that defines an optimal drug, and predicts drug cocktails that may target cancer more accurately than the individual drugs alone. Conceptualizing cancer drugs as solving a discrimination problem in the high-dimensional space of molecular markers promises to inform the design of new cancer drugs and drug cocktails.</p
MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data
MAPPFinder is a tool that creates a global gene-expression profile across all areas of biology by integrating the annotations of the Gene Ontology (GO) Project with the free software package GenMAPP . The results are displayed in a searchable browser, allowing the user to rapidly identify GO terms with over-represented numbers of gene-expression changes. Clicking on GO terms generates GenMAPP graphical files where gene relationships can be explored, annotated, and files can be freely exchanged
Single Cell Analysis of Blood Mononuclear Cells Stimulated Through Either LPS or Anti-CD3 and Anti-CD28.
Immune cell activation assays have been widely used for immune monitoring and for understanding disease mechanisms. However, these assays are typically limited in scope. A holistic study of circulating immune cell responses to different activators is lacking. Here we developed a cost-effective high-throughput multiplexed single-cell RNA-seq combined with epitope tagging (CITE-seq) to determine how classic activators of T cells (anti-CD3 coupled with anti-CD28) or monocytes (LPS) alter the cell composition and transcriptional profiles of peripheral blood mononuclear cells (PBMCs) from healthy human donors. Anti-CD3/CD28 treatment activated all classes of lymphocytes either directly (T cells) or indirectly (B and NK cells) but reduced monocyte numbers. Activated T and NK cells expressed senescence and effector molecules, whereas activated B cells transcriptionally resembled autoimmune disease- or age-associated B cells (e.g., CD11c, T-bet). In contrast, LPS specifically targeted monocytes and induced two main states: early activation characterized by the expression of chemoattractants and a later pro-inflammatory state characterized by expression of effector molecules. These data provide a foundation for future immune activation studies with single cell technologies (https://czi-pbmc-cite-seq.jax.org/)
GenMAPP 2: New features and resources for pathway analysis
BACKGROUND: Microarray technologies have evolved rapidly, enabling biologists to quantify genome-wide levels of gene expression, alternative splicing, and sequence variations for a variety of species. Analyzing and displaying these data present a significant challenge. Pathway-based approaches for analyzing microarray data have proven useful for presenting data and for generating testable hypotheses. RESULTS: To address the growing needs of the microarray community we have released version 2 of Gene Map Annotator and Pathway Profiler (GenMAPP), a new GenMAPP database schema, and integrated resources for pathway analysis. We have redesigned the GenMAPP database to support multiple gene annotations and species as well as custom species database creation for a potentially unlimited number of species. We have expanded our pathway resources by utilizing homology information to translate pathway content between species and extending existing pathways with data derived from conserved protein interactions and coexpression. We have implemented a new mode of data visualization to support analysis of complex data, including time-course, single nucleotide polymorphism (SNP), and splicing. GenMAPP version 2 also offers innovative ways to display and share data by incorporating HTML export of analyses for entire sets of pathways as organized web pages. CONCLUSION: GenMAPP version 2 provides a means to rapidly interrogate complex experimental data for pathway-level changes in a diverse range of organisms
Parenthood—A Contributing Factor to Childhood Obesity
Prevalence of childhood obesity and its complications have increased world-wide. Parental status may be associated with children’s health outcomes including their eating habits, body weight and blood cholesterol. The National Health and Nutrition Examination Survey (NHANES) for the years 1988–1994, provided a unique opportunity for matching parents to children enabling analyses of joint demographics, racial differences and health indicators. Specifically, the NHANES III data, 1988–1994, of 219 households with single-parents and 780 dual-parent households were analyzed as predictors for primary outcome variables of children’s Body Mass Index (BMI), dietary nutrient intakes and blood cholesterol. Children of single-parent households were significantly (p < 0.01) more overweight than children of dual-parent households. Total calorie and saturated fatty acid intakes were higher among children of single-parent households than dual-parent households (p < 0.05). On average, Black children were more overweight (p < 0.04) than children of other races. The study results implied a strong relationship between single-parent status and excess weight in children. Further studies are needed to explore the dynamics of single-parent households and its influence on childhood diet and obesity. Parental involvement in the development of school- and community-based obesity prevention programs are suggested for effective health initiatives. Economic constraints and cultural preferences may be communicated directly by family involvement in these much needed public health programs
Tet2 Controls the Responses of β cells to Inflammation in Autoimmune Diabetes.
β cells may participate and contribute to their own demise during Type 1 diabetes (T1D). Here we report a role of their expression of Tet2 in regulating immune killing. Tet2 is induced in murine and human β cells with inflammation but its expression is reduced in surviving β cells. Tet2-KO mice that receive WT bone marrow transplants develop insulitis but not diabetes and islet infiltrates do not eliminate β cells even though immune cells from the mice can transfer diabetes to NOD/scid recipients. Tet2-KO recipients are protected from transfer of disease by diabetogenic immune cells.Tet2-KO β cells show reduced expression of IFNγ-induced inflammatory genes that are needed to activate diabetogenic T cells. Here we show that Tet2 regulates pathologic interactions between β cells and immune cells and controls damaging inflammatory pathways. Our data suggests that eliminating TET2 in β cells may reduce activating pathologic immune cells and killing of β cells
AMULET: a novel read count-based method for effective multiplet detection from single nucleus ATAC-seq data.
Detecting multiplets in single nucleus (sn)ATAC-seq data is challenging due to data sparsity and limited dynamic range. AMULET (ATAC-seq MULtiplet Estimation Tool) enumerates regions with greater than two uniquely aligned reads across the genome to effectively detect multiplets. We evaluate the method by generating snATAC-seq data in the human blood and pancreatic islet samples. AMULET has high precision, estimated via donor-based multiplexing, and high recall, estimated via simulated multiplets, compared to alternatives and identifies multiplets most effectively when a certain read depth of 25K median valid reads per nucleus is achieved
- …